Analysis of simulated and observed snow water equivalent (SWE) dynamics in western North America over the Common Era.
Import the packages required for this analysis.
library(raster) # processing raster data
library(tidyverse) # data manipulation and visualization
library(mgcv) # flexible nonlinear regression models
library(furrr) # parallel processing
library(sf) # shapefiles and plotting
library(gganimate) # animated gifs
library(ggridges) # ridge plots
library(broom) # tidying model objects
Define a study area to constrain all computaitons.
bbox <- extent(c(-125, -104, 33, 50))
Import the snow observation data from https://nsidc.org/data/nsidc-0719.1 What’s up with this warning message? long_name=CRS definition spatial_ref=GEOGCS[“NAD83”,DATUM[“North_American_Datum_1983”,SPHEROID[“GRS 1980”,6378137,298.257222101,AUTHORITY[“EPSG”,“7019”]],AUTHORITY[“EPSG”,“6269”]],PRIMEM[“Greenwich”,0,AUTHORITY[“EPSG”,“8901”]],UNIT[“degree”,0.01745329251994328,AUTHORITY[“EPSG”,“9122”]],AUTHORITY[“EPSG”,“4269”]] GeoTransform=-125.0208 0.04166662697178698 0 49.9375 0 -0.04166662697178698. First define a function to extract the map for April 1st from each annual file, then run it on the SWE and snow depth data.
preprocess <- function(x, var, flip = FALSE, regrid = FALSE) {
maps <- brick(x, varname = var)
indices <- getZ(maps) %>% # find the index for April 1st
str_detect('-04-01') %>%
which()
subset(maps, indices) %>%
{if(flip) rotate(.) else .} %>%
crop(bbox) %>%
{if(regrid) aggregate(., fact = 2, na.rm = TRUE) else .} # resample to lower res to speed up analysis
}
plan(multisession) # setup parallel processing
# import SWE data
swe_obs <- list.files('data', pattern = 'SWE_Depth', full.names = TRUE) %>%
future_map(preprocess, var = 'SWE', regrid = TRUE) %>%
brick()
# import depth data
depth_obs <- list.files('data', pattern = 'SWE_Depth', full.names = TRUE) %>%
future_map(preprocess, var = 'DEPTH', regrid = TRUE) %>%
brick()
Import SRTM elevation and resample to the resolution of the observations. Also calculate the area of each grid cell.
elev <- raster('~/gdrive/Data/SRTM_1km.tif') %>%
crop(bbox) %>%
resample(swe_obs) %>%
brick(., area(.))
Turn all the rasters into data frames and join.
swe_dat <- swe_obs %>%
as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
mutate(year = round(parse_number(layer))) %>%
rename(SWE = value) %>%
select(-layer)
depth_dat <- depth_obs %>%
as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
mutate(year = round(parse_number(layer))) %>%
rename(DEPTH = value) %>%
select(-layer)
elev_dat <- elev %>%
as.data.frame(xy = TRUE, na.rm = TRUE) %>%
rename(elev = SRTM_1km, area = layer)
snow_dat <- swe_dat %>%
left_join(depth_dat, by = c('x', 'y', 'year')) %>%
left_join(elev_dat, by = c('x', 'y'))
Look at how SWE has varied over time and space.
## Warning: Duplicated aesthetics after name standardisation: colour
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Some aggregate statistics. Is this the right way to aggregate?
snow_agg <- snow_dat %>%
filter(SWE > 0) %>%
group_by(year) %>%
summarise(SWE = sum(SWE * area), # units in megaliters (?)
area = sum(area))
How has total SWE varied over the observational period?
What about total area covered?
cesm_h2osno <- preprocess('~/Downloads/b.e11.BLMTRC5CN.f19_g16.001.clm2.h1.H2OSNO.08500101-18491231.nc',
var = 'H2OSNO',
flip = TRUE)
## Loading required namespace: ncdf4
cesm_h2osno_ext <- preprocess('~/Downloads/b.e11.BLMTRC5CN.f19_g16.001.clm2.h1.H2OSNO.18500101-20051231.nc',
var = 'H2OSNO',
flip = TRUE)
cesm_areas <- area(cesm_h2osno) %>%
as.data.frame(xy = TRUE, na.rm = TRUE) %>%
rename(area = layer)
still need to account for snow fraction of grid cell? and land fraction too?
cesm_dat <- c(cesm_h2osno, cesm_h2osno_ext) %>%
brick() %>%
as.data.frame(xy = TRUE, na.rm = TRUE, long = TRUE) %>%
mutate(year = str_sub(layer, 2) %>% parse_number() %>% round) %>%
rename(SWE = value) %>%
select(-layer) %>%
left_join(cesm_areas, by = c('x', 'y'))
## Warning: Duplicated aesthetics after name standardisation: colour
cesm_agg <- cesm_dat %>%
group_by(year) %>%
summarise(SWE = sum(SWE * area, na.rm = TRUE)) %>%
mutate(avg = zoo::rollmean(SWE, k = 100, na.pad = TRUE))
## Warning: Removed 99 rows containing missing values (geom_path).
So there’s a clear bias here. The mean of the observations is well higher than it should be. Looks like the simulation underpredicts snow – likely because of all the small-scale topographic influences that CESM can’t resolve.
## Picking joint bandwidth of 2.1e+07
Let’s compare the observations and simulation directly.
What’s the climatic mean April 1st SWE for the observations and simulations?
## Warning: Duplicated aesthetics after name standardisation: colour
To what extent is this mistmatch due to bias or scale effects? Test by resampling the observations to the CESM LME grid.
## Warning: Duplicated aesthetics after name standardisation: colour
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
This comparison suggests that CESM is underpredicting SWE because of the simplified topography. Check this by looking at the underlying topography boundary files for that simulation.
surface_geop <- raster('data/consistent-topo-fv1.9x2.5_c130424.nc',
var = 'PHIS') %>%
rotate() %>%
crop(bbox, snap = 'out') %>%
`/`(9.80665) %>%
as.data.frame(na.rm = TRUE, xy = TRUE)
landfrac <- raster('data/consistent-topo-fv1.9x2.5_c130424.nc',
var = 'LANDFRAC') %>%
rotate() %>%
crop(bbox, snap = 'out')
## Warning: Duplicated aesthetics after name standardisation: colour
Might be interesting to look at snow centroid date too down the road. Spatial patterns notwithstanding, maybe the aggregate temporal patterns are still recoverable?
Now run a principal components analysis on the observed SPEI data. Much of the code that follows is adapted from the wql and sinkr packages.
obs_pca <- swe_dat %>%
group_by(x, y) %>%
mutate(test = all(SWE == 0)) %>%
filter(test == FALSE) %>%
ungroup() %>%
select(-test)%>%
spread(year, SWE) %>%
select(-x, -y) %>%
t() %>%
prcomp(scale. = TRUE)
sim_pca <- cesm_dat %>%
filter(year > 1900) %>%
group_by(x, y) %>%
mutate(test = all(SWE == 0)) %>%
filter(test == FALSE) %>%
ungroup() %>%
select(-test) %>%
spread(year, SWE) %>%
select(-x, -y, -area) %>%
t() %>%
prcomp(scale. = TRUE)
eofs_obs <- obs_pca %>% # calculate unrotated EOFs
tidy(matrix = 'variables') %>%
filter(PC <= 2) %>%
left_join(obs_eigs[1:2], by = 'PC') %>%
mutate(eof = value * std.dev,
PC = as.character(PC)) %>%
select(-std.dev,-value)
eofs_sim <- sim_pca %>% # calculate unrotated EOFs
tidy(matrix = 'variables') %>%
filter(PC <= 4) %>%
left_join(sim_eigs[1:2], by = 'PC') %>%
mutate(eof = value * std.dev,
PC = as.character(PC)) %>%
select(-std.dev,-value)
## Joining, by = "column"
## Warning: Duplicated aesthetics after name standardisation: colour
## Joining, by = "column"
## Warning: Duplicated aesthetics after name standardisation: colour
m1 <- bam(SWE ~ te(x, y, by = year, k = 20), data = snow_dat)
summary(m1)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## SWE ~ te(x, y, by = year, k = 20)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1056.66 19.45 54.34 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(x,y):year 382 383.8 3717 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.475 Deviance explained = 47.5%
## fREML = 9.8714e+06 Scale est. = 16088 n = 1576188
Looks good down here, but these are in absolute numbers. Maybe rescale to look at relative change?
snow_dat %>%
filter(year == 1999) %>%
mutate(pred = predict(m1, ., type = 'response')) %>%
ggplot(aes(x, y)) +
geom_raster(aes(fill = pred)) +
scale_fill_distiller(palette = 'RdBu', direction = 1, limits = c(-1000, 1000))
First let’s model the relationship between snow depth and snow water equivalent (i.e. snow density) in the observations.
snow_dat %>%
ggplot(aes(DEPTH, SWE)) +
geom_point()
m2 <- bam(SWE ~ s(DEPTH), data = snow_dat, discrete = TRUE)
plot(m2)
summary(m2)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## SWE ~ s(DEPTH)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1165.0027 0.3417 3409 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(DEPTH) 8.824 8.981 8550156 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.98 Deviance explained = 98%
## fREML = 7.2993e+06 Scale est. = 616.49 n = 1576188
So the question now is whether the mismatch between observed and simulated SWE is due to the meteorolgical forcing. That is, if only we had better temperature and precipitation simulations then we could model SWE as a simple function of those. To examine this, use reanalysis data.
Start by importing ERA5 data.
ncdf_file <- 'data/adaptor.mars.internal-1566933160.7965047-18657-23-b3ae7b43-15db-4890-bc5b-4d3413fe4ddd.nc'
ref <- brick(ncdf_file, var = 'sd') %>%
.[[1]]%>%
rotate %>%
crop(bbox)
elev2 <- resample(elev[[1]], ref)
We need to use snowdepth rather than SWE, but as shown above this isn’t a huge deal in the long run. Here just import April snowdepths, which we’ll use as a rough equivalent for April 1st SWE.
snowdepth <- brick(ncdf_file, var = 'sd') %>%
.[[seq(4,484, 12)]] %>% # get april
rotate %>%
crop(bbox) %>%
mask(elev2) %>%
as.data.frame(xy = TRUE, long = TRUE, na.rm = TRUE) %>%
rename(time = layer, sd = value)
Also import temperature and precipiation for all months.
temp <- brick(ncdf_file, var = 't2m') %>%
rotate %>%
crop(bbox) %>%
mask(elev2) %>%
as.data.frame(xy = TRUE, long = TRUE, na.rm = TRUE) %>%
rename(time = Z, t2m = value)
precip <- brick(ncdf_file, var = 'tp') %>%
rotate %>%
crop(bbox) %>%
mask(elev2) %>%
as.data.frame(xy = TRUE, long = TRUE, na.rm = TRUE) %>%
rename(time = Z, tp = value)
Calculate the mean temperature and total precipitation for Oct-Mar to see if there’s any relationship to snowdepth.
dat <- precip %>%
left_join(temp, by = c('x', 'y', 'time')) %>%
mutate(t2m = t2m - 273.15) %>%
separate(time, into = c('year', 'month', 'day'), convert = TRUE) %>%
mutate(water_year = if_else(month < 10, year, year + 1L)) %>%
filter(between(water_year, 1980, 2018)) %>%
filter(month %in% c(1:3, 9:12)) %>%
group_by(x, y, water_year) %>%
summarise(t2m = mean(t2m), tp = sum(tp))
Combine all the data.
dat2 <- snowdepth %>%
mutate(time = str_sub(time, start = 2)) %>%
separate(time, into = c('year', 'month', 'day'), convert = TRUE) %>%
select(-day, -month) %>%
left_join(dat, by = c('x', 'y', 'year' = 'water_year')) %>%
remove_missing()
## Warning: Removed 10458 rows containing missing values.
The spatial and temporal variability of the reanalysis snow depth looks close to observations, so that’s a good sign.
## Warning: Duplicated aesthetics after name standardisation: colour
Fit a GAM predicting snow depth as a nonlinear interaction between seasonal temperature and precipitation. Include terms to account for spatial and temporal autocorrelation
mod <- dat2 %>%
mutate(start = if_else(year == min(year), TRUE, FALSE)) %>%
arrange(x, y) %>%
bam(sd ~ te(t2m, tp) + te(x,y), data = ., AR.start = start)
The model performs suprisingly well.
summary(mod)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## sd ~ te(t2m, tp) + te(x, y)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.445e-02 9.412e-05 366 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(t2m,tp) 23.82 23.98 18310.5 <2e-16 ***
## te(x,y) 23.81 23.99 462.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.758 Deviance explained = 75.8%
## fREML = -3.5493e+05 Scale est. = 0.0017987 n = 203931
plot(mod, scheme = 1)
Plotting the results, it does indeed look like we can predict SWE/snow depth, or at least climatological SWE, from temperature and precipitation data.
## Warning: Duplicated aesthetics after name standardisation: colour